A Statistical Model for Measuring Structural Similarity between Webpages
نویسندگان
چکیده
This paper presents a statistical model for measuring structural similarity between webpages from bilingual websites. Starting from basic assumptions we derive the model and propose an algorithm to estimate its parameters in unsupervised manner. Statistical approach appears to benefit the structural similarity measure: in the task of distinguishing parallel webpages from bilingual websites our languageindependent model demonstrates an Fscore of 0.94–0.99 which is comparable to the results of language-dependent methods involving content similarity measures.
منابع مشابه
A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection
Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...
متن کاملA novel method for detecting structural damage based on data-driven and similarity-based techniques under environmental and operational changes
The applications of time series modeling and statistical similarity methods to structural health monitoring (SHM) provide promising and capable approaches to structural damage detection. The main aim of this article is to propose an efficient univariate similarity method named as Kullback similarity (KS) for identifying the location of damage and estimating the level of damage severity. An impr...
متن کاملInformation-Theoretic Approaches for Measuring the Structural Similarity of Semistructured Documents
We propose and experimentally evaluate different approaches for measuring the structural similarity of semistructured documents based on informationtheoretic concepts. Common to all approaches is a twostep procedure: first we extract and linearize the structural information from documents and then we use similarity measures that are based on, respectively, Kolmogorov complexity and Shannon entr...
متن کاملCluster-Based Image Segmentation Using Fuzzy Markov Random Field
Image segmentation is an important task in image processing and computer vision which attract many researchers attention. There are a couple of information sets pixels in an image: statistical and structural information which refer to the feature value of pixel data and local correlation of pixel data, respectively. Markov random field (MRF) is a tool for modeling statistical and structural inf...
متن کاملReusing Models of Different Abstraction Levels
Reuse of models assists in constructing a new model on the basis of existing knowledge, by retrieving a model that matches a preliminary partial input model. It often employs similarity measures for identifying reusable models that are structurally and semantically similar to the input model. However, in many cases the preliminary input model is of a higher level of abstraction than the detaile...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015